Online Class Incremental learning (CIL) is a challenging setting in Continual Learning (CL), wherein data of new tasks arrive in incoming streams and online learning models need to handle incoming data streams without revisiting previous ones. Existing works used a single centroid adapted with incoming data streams to characterize a class. This approach possibly exposes limitations when the incoming data stream of a class is naturally multimodal. To address this issue, in this work, we first propose an online mixture model learning approach based on nice properties of the mature optimal transport theory (OT-MM). Specifically, the centroids and covariance matrices of the mixture model are adapted incrementally according to incoming data streams. The advantages are two-fold: (i) we can characterize more accurately complex data streams and (ii) by using centroids for each class produced by OT-MM, we can estimate the similarity of an unseen example to each class more reasonably when doing inference. Moreover, to combat the catastrophic forgetting in the CIL scenario, we further propose Dynamic Preservation. Particularly, after performing the dynamic preservation technique across data streams, the latent representations of the classes in the old and new tasks become more condensed themselves and more separate from each other. Together with a contraction feature extractor, this technique facilitates the model in mitigating the catastrophic forgetting. The experimental results on real-world datasets show that our proposed method can significantly outperform the current state-of-the-art baselines.
translated by 谷歌翻译
我们考虑如何在从流环境中学习贝叶斯模型时有效地使用先验知识,其中数据无限依次出现。这个问题在数据爆炸时代非常重要,富有培训的模型,本体,维基百科等珍贵外部知识的富裕来源非常重要。我们表明一些现有的方法可以忘记任何知识。然后,我们提出了一种新颖的框架,使能够将不同形式的先验知识纳入基础贝叶斯模型的数据流。我们的框架载有一些现有的时序/动态数据的流行模型。广泛的实验表明,我们的框架优于具有大边距的现有方法。特别是,我们的框架可以帮助贝叶斯模型在极短的文本上概括,而其他方法过度装备。我们的框架的实施是在https://github.com/bachtranxuan/tps.git上获得的。
translated by 谷歌翻译
从数据流学习隐藏的主题已经成为绝对必要的,但构成了挑战性问题,如概念漂移以及短而嘈杂的数据。使用先验知识来丰富主题模型是应对这些挑战的潜在解决方案之一。先前知识,其来自人类知识(例如Wordnet)或预先训练的模型(例如Word2Vec)是非常有价值的,并且有助于帮助主题模型更好地工作。然而,在数据到达不断且无限的流动环境中,现有研究仅限于有效利用这些资源。特别是,忽略了包含有意义的词关系的知识图形。在本文中,为了有效利用知识图,我们提出了一种新颖的图形卷积主题模型(GCTM),它将图形卷积网络(GCN)集成到一个主题模型和学习方法,它同时学习网络和主题模型数据流。在每个小纤维中,我们的方法不仅可以利用外部知识图,还可以平衡外部和旧知识,以便在新数据上表现良好。我们进行广泛的实验来评估我们的方法,以评估我们的知识图(WordNet)和由预先接受训练的Word Embeddings(Word2VEC)构建的图形的图表。实验结果表明,在概率预测措施和主题连贯性方面,我们的方法比最先进的基线达到更好的表现。特别是,在处理短文本以及概念漂移时,我们的方法可以很好地工作。 GCTM的实现可在\ URL {https://github.com/bachtranxuan/gctm.git}。
translated by 谷歌翻译
Diabetic Retinopathy (DR) is a leading cause of vision loss in the world, and early DR detection is necessary to prevent vision loss and support an appropriate treatment. In this work, we leverage interactive machine learning and introduce a joint learning framework, termed DRG-Net, to effectively learn both disease grading and multi-lesion segmentation. Our DRG-Net consists of two modules: (i) DRG-AI-System to classify DR Grading, localize lesion areas, and provide visual explanations; (ii) DRG-Expert-Interaction to receive feedback from user-expert and improve the DRG-AI-System. To deal with sparse data, we utilize transfer learning mechanisms to extract invariant feature representations by using Wasserstein distance and adversarial learning-based entropy minimization. Besides, we propose a novel attention strategy at both low- and high-level features to automatically select the most significant lesion information and provide explainable properties. In terms of human interaction, we further develop DRG-Net as a tool that enables expert users to correct the system's predictions, which may then be used to update the system as a whole. Moreover, thanks to the attention mechanism and loss functions constraint between lesion features and classification features, our approach can be robust given a certain level of noise in the feedback of users. We have benchmarked DRG-Net on the two largest DR datasets, i.e., IDRID and FGADR, and compared it to various state-of-the-art deep learning networks. In addition to outperforming other SOTA approaches, DRG-Net is effectively updated using user feedback, even in a weakly-supervised manner.
translated by 谷歌翻译
Current language models are considered to have sub-human capabilities at natural language tasks like question-answering or writing code. However, language models are not trained to perform well at these tasks, they are trained to accurately predict the next token given previous tokes in tokenized text. It is not clear whether language models are better or worse than humans at next token prediction. To try to answer this question, we performed two distinct experiments to directly compare humans and language models on this front: one measuring top-1 accuracy and the other measuring perplexity. In both experiments, we find humans to be consistently \emph{worse} than even relatively small language models like GPT3-Ada at next-token prediction.
translated by 谷歌翻译
Random graph models with community structure have been studied extensively in the literature. For both the problems of detecting and recovering community structure, an interesting landscape of statistical and computational phase transitions has emerged. A natural unanswered question is: might it be possible to infer properties of the community structure (for instance, the number and sizes of communities) even in situations where actually finding those communities is believed to be computationally hard? We show the answer is no. In particular, we consider certain hypothesis testing problems between models with different community structures, and we show (in the low-degree polynomial framework) that testing between two options is as hard as finding the communities. In addition, our methods give the first computational lower bounds for testing between two different `planted' distributions, whereas previous results have considered testing between a planted distribution and an i.i.d. `null' distribution.
translated by 谷歌翻译
Recent times have witnessed an increasing number of applications of deep neural networks towards solving tasks that require superior cognitive abilities, e.g., playing Go, generating art, question answering (such as ChatGPT), etc. Such a dramatic progress raises the question: how generalizable are neural networks in solving problems that demand broad skills? To answer this question, we propose SMART: a Simple Multimodal Algorithmic Reasoning Task and the associated SMART-101 dataset, for evaluating the abstraction, deduction, and generalization abilities of neural networks in solving visuo-linguistic puzzles designed specifically for children in the 6-8 age group. Our dataset consists of 101 unique puzzles; each puzzle comprises a picture and a question, and their solution needs a mix of several elementary skills, including arithmetic, algebra, and spatial reasoning, among others. To scale our dataset towards training deep neural networks, we programmatically generate entirely new instances for each puzzle while retaining their solution algorithm. To benchmark the performance on the SMART-101 dataset, we propose a vision and language meta-learning model using varied state-of-the-art backbone neural networks. Our experiments reveal that while powerful deep models offer reasonable performances on puzzles that they are trained on, they are not better than random accuracy when analyzed for generalization. We also evaluate the recent ChatGPT large language model on a subset of our dataset and find that while ChatGPT produces convincing reasoning abilities, the answers are often incorrect.
translated by 谷歌翻译
Trajectory-User Linking (TUL) is a relatively new mobility classification task in which anonymous trajectories are linked to the users who generated them. With applications ranging from personalized recommendations to criminal activity detection, TUL has received increasing attention over the past five years. While research has focused mainly on learning deep representations that capture complex spatio-temporal mobility patterns unique to individual users, we demonstrate that visit patterns are highly unique among users and thus simple heuristics applied directly to the raw data are sufficient to solve TUL. More specifically, we demonstrate that a single check-in per trajectory is enough to correctly predict the identity of the user up to 85% of the time. Moreover, by using a non-parametric classifier, we scale up TUL to over 100k users which is an increase over state-of-the-art by three orders of magnitude. Extensive empirical analysis on four real-world datasets (Brightkite, Foursquare, Gowalla and Weeplaces) compares our findings to state-of-the-art results, and more importantly validates our claim that TUL is easier than commonly believed.
translated by 谷歌翻译
Recent advances in safety-critical risk-aware control are predicated on apriori knowledge of the disturbances a system might face. This paper proposes a method to efficiently learn these disturbances online, in a risk-aware context. First, we introduce the concept of a Surface-at-Risk, a risk measure for stochastic processes that extends Value-at-Risk -- a commonly utilized risk measure in the risk-aware controls community. Second, we model the norm of the state discrepancy between the model and the true system evolution as a scalar-valued stochastic process and determine an upper bound to its Surface-at-Risk via Gaussian Process Regression. Third, we provide theoretical results on the accuracy of our fitted surface subject to mild assumptions that are verifiable with respect to the data sets collected during system operation. Finally, we experimentally verify our procedure by augmenting a drone's controller and highlight performance increases achieved via our risk-aware approach after collecting less than a minute of operating data.
translated by 谷歌翻译
Bio-inspired learning has been gaining popularity recently given that Backpropagation (BP) is not considered biologically plausible. Many algorithms have been proposed in the literature which are all more biologically plausible than BP. However, apart from overcoming the biological implausibility of BP, a strong motivation for using Bio-inspired algorithms remains lacking. In this study, we undertake a holistic comparison of BP vs. multiple Bio-inspired algorithms to answer the question of whether Bio-learning offers additional benefits over BP, rather than just biological plausibility. We test Bio-algorithms under different design choices such as access to only partial training data, resource constraints in terms of the number of training epochs, sparsification of the neural network parameters and addition of noise to input samples. Through these experiments, we notably find two key advantages of Bio-algorithms over BP. Firstly, Bio-algorithms perform much better than BP when the entire training dataset is not supplied. Four of the five Bio-algorithms tested outperform BP by upto 5% accuracy when only 20% of the training dataset is available. Secondly, even when the full dataset is available, Bio-algorithms learn much quicker and converge to a stable accuracy in far lesser training epochs than BP. Hebbian learning, specifically, is able to learn in just 5 epochs compared to around 100 epochs required by BP. These insights present practical reasons for utilising Bio-learning rather than just its biological plausibility and also point towards interesting new directions for future work on Bio-learning.
translated by 谷歌翻译